In this post, we’ll be walking through how to visualize the classic Palmer’s Penguins dataset.
Reading in the Data
First, we read in the data from a url using pandas to create a dataframe.
import pandas as pdurl ="https://raw.githubusercontent.com/pic16b-ucla/24W/main/datasets/palmer_penguins.csv"penguins = pd.read_csv(url) # read in dataislands = penguins["Island"].unique() # get a list of the different islandspenguins.head()
studyName
Sample Number
Species
Region
Island
Stage
Individual ID
Clutch Completion
Date Egg
Culmen Length (mm)
Culmen Depth (mm)
Flipper Length (mm)
Body Mass (g)
Sex
Delta 15 N (o/oo)
Delta 13 C (o/oo)
Comments
0
PAL0708
1
Adelie Penguin (Pygoscelis adeliae)
Anvers
Torgersen
Adult, 1 Egg Stage
N1A1
Yes
11/11/07
39.1
18.7
181.0
3750.0
MALE
NaN
NaN
Not enough blood for isotopes.
1
PAL0708
2
Adelie Penguin (Pygoscelis adeliae)
Anvers
Torgersen
Adult, 1 Egg Stage
N1A2
Yes
11/11/07
39.5
17.4
186.0
3800.0
FEMALE
8.94956
-24.69454
NaN
2
PAL0708
3
Adelie Penguin (Pygoscelis adeliae)
Anvers
Torgersen
Adult, 1 Egg Stage
N2A1
Yes
11/16/07
40.3
18.0
195.0
3250.0
FEMALE
8.36821
-25.33302
NaN
3
PAL0708
4
Adelie Penguin (Pygoscelis adeliae)
Anvers
Torgersen
Adult, 1 Egg Stage
N2A2
Yes
11/16/07
NaN
NaN
NaN
NaN
NaN
NaN
NaN
Adult not sampled.
4
PAL0708
5
Adelie Penguin (Pygoscelis adeliae)
Anvers
Torgersen
Adult, 1 Egg Stage
N3A1
Yes
11/16/07
36.7
19.3
193.0
3450.0
FEMALE
8.76651
-25.32426
NaN
Plotting
Now, we plot our data using matplotlib.
import matplotlib.pyplot as plt # import pyplot
After importing pyplot, we should start building up to our final plot (it will be a bunch of histograms). Let’s create the make_hist function to plot our data.
def make_hist(row, island):''' This function puts our data into histograms, and it makes one row at a time. Each row will have a different island that the data is from. '''# make histogram for males on the left from a specific island ax[row, 0].hist(penguins[["Body Mass (g)"]][(penguins["Sex"]=="MALE") & (penguins["Island"]==island)])# set consistent limits of histograms ax[row, 0].set_xlim([2500,6500]) ax[row, 0].set_ylim([0,25])# make histogram for females on the right from a specific island ax[row, 1].hist(penguins[["Body Mass (g)"]][(penguins["Sex"]=="FEMALE") & (penguins["Island"]==island)], color ="orange")# set consistent limits of histograms ax[row, 1].set_xlim([2500,6500]) ax[row, 1].set_ylim([0,25])
Without the consistent limits that we set, it would be hard to really compare our data between each histogram. Otherwise, our sense of scale is thrown off.
Next, we will create a function to label everything and create our titles.
def set_text(row, island):''' This function labels and titles most everything in our final figure '''# set the labels for the x-axis ax[row, 0].set_xlabel("Body Mass (g)") ax[row, 1].set_xlabel("Body Mass (g)")# set the labels for the y-axis ax[row, 0].set_ylabel("Number of Penguins") ax[row, 1].set_ylabel("Number of Penguins")# set the titles for the histograms ax[row, 0].set_title("Male Penguins from "+ islands[row]) ax[row, 1].set_title("Female Penguins from "+ islands[row])
Now we join these two functions to a bigger function that will make a row when it is run.
def make_row(row, island):''' This function completely creates a row by plotting the data and labeling it. ''' make_hist(row, island) set_text(row, island)
The last thing to do is to create our subplots, make a row for each island, and top it off by titling the whole figure.
# create one plot for males, one for females, and make a row for each islandfig, ax = plt.subplots(len(islands), 2, figsize=(13, 17))for index, island inenumerate(islands): make_row(index, island) # make each rowfig.suptitle("Body Mass (g) of Palmer's Penguins") # set title of the whole figure
Text(0.5, 0.98, "Body Mass (g) of Palmer's Penguins")